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Leiden University Medical Center and University of Padova 

Closed testing and partitioning are recognized as fundamental 
principles of familywise error control. In this paper, we argue that se- 
quential rejection can be considered equally fundamental as a general 
principle of multiple testing. We present a general sequentially rejec- 
tive multiple testing procedure and show that many well-known fam- 
ilywise error controlling methods can be constructed as special cases 
of this procedure, among which are the procedures of Holm, Shaffer 
and Hochberg, parallel and serial gatekeeping procedures, modern 
procedures for multiple testing in graphs, resampling-based multiple 
testing procedures and even the closed testing and partitioning pro- 
cedures themselves. We also give a general proof that sequentially 
rejective multiple testing procedures strongly control the familywise 
error if they fulfill simple criteria of monotonicity of the critical values 
and a limited form of weak familywise error control in each single step. 
The sequential rejection principle gives a novel theoretical perspec- 
tive on many well-known multiple testing procedures, emphasizing 
the sequential aspect. Its main practical usefulness is for the devel- 
opment of multiple testing procedures for null hypotheses, possibly 
logically related, that are structured in a graph. We illustrate this by 
presenting a uniform improvement of a recently published procedure. 

1. Introduction. Well-known multiple testing procedures that control 
the familywise error are often sequential, in the sense that rejection of some 
of the hypotheses may make rejection of the remaining hypotheses easier. 
A famous example is Holm's (1979) procedure, in which the alpha level for re- 
jection of each null hypothesis depends on the number of previously rejected 
hypotheses. Other classical examples of sequentially rejective multiple test- 
ing procedures include various types of gatekeeping procedures [Dmitrienko 
and Tamhane (2007)], which can be explicitly constructed as sequential, and 
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the closed testing procedure [Marcus, Peritz and Gabriel (1976)], in which 
rejection of a null hypothesis can only occur after all implying intersection 
null hypotheses have been rejected. Other modern multiple testing proce- 
dures, such as the exact resampling-based method of Romano and Wolf 
(2005), as well as recent methods for multiple testing in graphs of logically 
related hypotheses [Goeman and Mansmann (2008), Meinshausen (2008)], 
can also be viewed as sequentially rejective procedures. 

This paper presents a unified approach to the class of sequentially rejec- 
tive multiple testing procedures, emphasizing the sequential aspect. A gen- 
eral sequentially rejective procedure will be constructed as a sequence of 
single-step methods, determined by a rule for setting the rejection regions 
for each null hypothesis based on the current collection of unrejected null 
hypotheses. The general sequentially rejective procedure encompasses all of 
the methods mentioned above, as well as many others. Our work continues 
along the path set out by Romano and Wolf (2005), who wrote of stepwise 
procedures that "an ideal situation would be to proceed at any step without 
regard to previous rejections, in the sense that once a hypothesis is rejected, 
the remaining hypotheses are treated as a new family, and testing for this 
new family proceeds independent of past decisions." We extend the work 
of Romano and Wolf (2005, 2010) to logically related hypotheses and show 
that past decisions can even make the tests in each new family easier, as the 
tests for each new family may assume that all rejections in previous families 
were correct rejections, as in Shaffer's (1986) procedure. By emphasizing the 
role of logical relationships between hypotheses, we are able to demonstrate 
the versatility of sequential rejection as an approach to multiple testing. 

We give a general proof that sequentially rejective multiple testing pro- 
cedures strongly control the familywise error. The proof shows that, for 
any sequentially rejective multiple testing procedure that fulfills a simple 
monotonicity requirement, strong familywise error control of the sequential 
procedure follows from a limited form of weak familywise error control at 
each single step. This property, which can be used to turn a single-step fam- 
ilywise error controlling procedure into a sequential one, is a very general 
principle of familywise error control. We refer to this principle as the se- 
quential rejection principle. It does not depend in any way on the method of 
familywise error control imposed in the single steps and it does not require 
any additional assumptions on the joint distribution of the test statistics, 
aside from what is needed for single-step familywise error control. 

It is a notable feature of the sequential rejection principle that control 
of the familywise error at each single step is only necessary with respect to 
those data distributions for which all previous rejections have been correct 
rejections. As a consequence, the principle facilitates the design of sequen- 
tially rejective multiple testing procedures in situations in which there are 
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logical relationships between null hypotheses. Also, in other cases, the princi- 
ple may aid the design of multiple testing procedures since, by the principle, 
proof of familywise error control of the sequential procedure can be achieved 
by checking monotonicity and proving weak familywise error control of single 
steps, which, typically, is relatively easy to do. 

Earlier generalizations of sequentially rejective testing were formulated by 
Romano and Wolf (2005) and Hommel, Bretz and Maurer (2007), both on 
the basis of the closed testing procedure Marcus, Peritz and Gabriel (1976). 
Our procedure can be seen as an extension of these procedures, encompassing 
both as a special well as some other procedures that these earlier gen- 

eralizations do not encompass. The procedure of Hommel, Bretz and Maurer 
is limited to Bonferroni-based control at each single step. The procedure of 
Romano and Wolf was originally limited to have identical critical values for 
all hypotheses, although this was generalized by Romano and Wolf (2010). 
Neither Romano and Wolf (2005, 2010) nor Hommel, Bretz and Maurer 
(2007) explicitly considered the issue of logically related hypotheses. 

This paper is organized as follows. We first formulate the general sequen- 
tially rejective multiple testing procedure and a set of sufficient conditions 
under which such a procedure guarantees strong control of the familywise 
error. Together with the formal statements, much attention will be given 
to the development of the intuitions behind the principle. The remaining 
part of the paper is devoted to a review of well-known multiple testing 
procedures, in which we show how important procedures such as Shaffer, 
Hochberg, closed testing, partitioning and gatekeeping procedures can be 
viewed as examples of the general sequentially rejective procedure. The ma- 
jority of sequentially rejective procedures use some version of Bonferroni, 
modified by Shaffer's (1986) treatment of logically related null hypotheses, 
in their single-step control of the familywise error. We go into this specific 
class of procedures in more detail in Section 3. We also give examples of 
non-Bonferroni-based procedures, such as resampling-based multiple testing 
[Romano and Wolf (2005)] and the step-up method of Hochberg (1988), that 
can be viewed as special cases of the general sequentially rejective multiple 
testing procedure, demonstrating that the sequential rejection principle is 
not restricted to Bonferroni-Shaffer-based methods. Next, we demonstrate 
how the sequential rejection principle might be used to improve existing 
procedures by presenting a uniform improvement of the method of Mein- 
shausen (2008) for tree-structured hypotheses. Finally, we show how to cal- 
culate multiplicity-adjusted p-values for the general sequentially rejective 
procedure. 

2. Sequential rejection. The formulation of the general sequentially re- 
jective procedure and its proof require formal notation. We suppose that we 
have a statistical model, a set M for which each M £ M indexes a probabil- 
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ity measure P^, defined on a common outcome space fi. We also suppose 
that we have a collection H of null hypotheses of interest, each of which 
is a proper submodel of M, that is, H C M for every H £ Tl. Depending 
on Pjvf, some or all of the hypotheses in T~L may be true null hypothe- 
ses. For each MgM, we define the collection of true null hypotheses as 
T(M) = {H € % : M G H} C T~L and the collection of false null hypotheses 
as J~(M) = 1-L\ T(M). If desired, the collection H may contain an infinite 
number of hypotheses. Collections such as H. are collections of sets. We use 
the shorthand 



when working with such collections of sets (both for unions and for inter- 
sections). We use the phrase almost surely for statements that hold with 
probability 1 for every M in M. 

2.1. The principle. We first present the sequential rejection principle in 
a general set-theoretic form that does not involve test statistics and critical 
values. 

In general, we define a sequentially rejective multiple testing procedure 
of the hypotheses in T~L by choosing a random and measurable function M 
that maps from the power set 2^ of all subsets of % to itself. We call M the 
successor function and interpret NiJZ) as saying what to reject in the next 
step of the procedure, after having rejected 1Z in the previous step. 

The sequentially rejective procedure based on N iteratively rejects null 
hypotheses in the following manner. Let 7£j C i = 0, 1, .. . , be the col- 
lection of null hypotheses rejected after step i. The procedure is defined 



In short, a sequentially rejective procedure is a procedure that sequentially 
chooses hypotheses to reject, based on the collection of hypotheses that have 
previously been rejected. Let IZoo = lim^oo IZi be the final set of rejected 
null hypotheses. Two simple conditions on J\f are sufficient for the procedure 
(1) to strongly control the family wise error. These are given in Theorem 1. 

Theorem 1 (Sequential rejection principle). Suppose that for every 1Z C 
S C Ti, almost surely, 




by 



(i) 



7^+1 = 7^1^(7^). 




V M {M{F{M)) C T{M)) > 1 - a. 



M{1Z) c N{S) u S 
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Then, for every M € M, 

(4) PM(^ooCJ(M))>l-a 

A simple outline of the proof of Theorem 1, given below, will give an 
intuitive explanation of the familywise error control of sequentially rejec- 
tive multiple testing procedures. On one hand, condition (3), the single-step 
condition, guarantees familywise error control in the "critical case" in which 
we have rejected all false null hypotheses and none of the true ones. On the 
other hand, condition (2), the monotonicity condition, guarantees that no 
false rejection in the critical case also implies no false rejection in situations 
with fewer rejections than in the critical case so that type I error control 
in the critical case is sufficient for overall familywise error control of the 
sequential procedure. 

Proof of Theorem 1. Choose any M 6 M, use the shorthand T = 
T(M), F = T{M) = n \ T and let E be the event E = {N{T) C T}. By 
the single-step condition (3), we have Pm{E) > 1 — a. Suppose that the 
event E is realized. We now use induction to prove that, in this case, TZi C 
T . Obviously, TZq C T . Now, suppose that TZi C J 7 . By the monotonicity 
assumption, we have, almost surely, 

K i+ i n T = N(Ki) n T c M{T) nT=0. 

Therefore, E implies that TZi QJ- for all i. Hence, for all i, 

VAi(TZiCT)>P(E)>l-a. 

The corresponding result for TZoo follows from the dominated convergence 
theorem. □ 

A simple and general admissibility criterion can be derived from The- 
orem 1 in the case of restricted combinations [Shaffer (1986)]. Restricted 
combinations occur if, for some 7Z C Ti, there is no model M € M such 
that 1Z = J~(M) . A standard example concerns testing pairwise equality of 
means in a one-way ANOVA model: if any single null hypothesis is false, it 
is not possible that all other null hypotheses are simultaneously true. Let 

= { J-{M) : M € M} , the collection of subsets of H that can actually be 
a collection of false null hypotheses. For collections TZ^Q, the single-step 
condition sets no restrictions on N(1Z) , so M(TZ) is only constrained by the 
monotonicity condition. Without loss of familywise error control, we may, 
therefore, set N{7Z) to be the maximal set allowed by monotonicity, setting 

(5) N{1Z) ={~){S(JN(S):1ZCS £ $} for every 1Z£$, 

interpreting this as N{1Z) = % if there is no S € $ for which IZdS. Any 
sequential rejection procedure that does not fulfill (5) is inadmissible and 
can be uniformly improved by redefining M such that (5) holds. 
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2.2. Using test statistics and critical values. We generally think of a 
multiple testing procedure as a procedure that involves test statistics and 
critical values. To understand the principle, it is helpful to reformulate the 
principle in such terms, even when that makes it slightly less general. Assume 
that we have a test statistic Sh :0 — » R for each null hypothesis H € H, 
for which large values of Sh indicate evidence against H. In that case, we 
can construct a successor function J\f by choosing a critical value function 
c = {ch}h£_h for which each ch maps from the power set 2^ of all subsets of 
% to RU {— 00,00}. The function c may be either fixed and chosen in advance 
before data collection, or it may be random, possibly even depending on the 
data, as in permutation testing or other resampling-based testing. Choosing 

(6) M(n) = {HeU\K:s H >c H {K)}, 

the function ch(JZ) gives the critical value for hypothesis H after the hy- 
potheses in 1Z have been rejected. Only the values of ch(TZ) for H ^ 7Z are 
relevant; the values for H £ 1Z do not influence the procedure in any way. 

The sequentially rejective procedure based on (6) is a sequence of single- 
step procedures. At each single step, the critical values for all null hypotheses 
are determined by the set TZi of rejected null hypotheses in the previous step, 
or, equivalently, by the set H\TZi of remaining hypotheses. After every step, 
the procedure adjusts the critical values on the basis of the new rejected set. 

The monotonicity condition (2) for the choice (6) of N{1Z) is equivalent 
to the requirement that for every 1Z C S C Ti and every H €7i\S, we have 

(7) c H {lZ)>c H (S). 

In the case where the critical value function c is random (see Section 5.2), 
the condition (7) should hold almost surely. The condition requires that as 
more hypotheses are rejected, the critical values of unrejected null hypothe- 
ses never increase, so that, generally, more rejections in previous steps allow 
reduced critical values in subsequent steps. It follows immediately from con- 
dition (7) that for every H € Ti \ TZi, 

(8) c H (lZ l+1 )<c H {lZi), 

so a sequentially rejective procedure that fulfils the monotonicity condi- 
tion (7) must have nonincreasing critical values at every step. It is important 
to realize, however, that the statement (8) is a substantially weaker state- 
ment than the condition (7) itself. In fact, as an alternative condition, the 
statement (8) is too weak to guarantee familywise error control. We show 
this with a counterexample in Appendix A. This counterexample shows that 
the condition (7) must also hold for sets 7Z and S that can never appear 
as members of the same sequence TZo,TZ\, ... of sets of rejected hypotheses. 
Romano and Wolf (2005) also provide an interesting example illustrating the 
importance of monotonicity in sequentially rejective procedures (Example 6 
in that paper). 
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The single-step condition (3) translates to the requirement that for every 
71 C Ti and every M G M for which 71 = T(M), we have 

(9) Pm( (J {S H >c H {K)})<a. 

The condition (9) requires a limited form of weak familywise error control at 
each individual step. The most notable feature of this limited form of control 
is that it is not necessary to control the familywise error for all possible 
data distributions in M G M, but only for those distributions for which 
7Z = J-(M). This clause relaxes the required control in two useful ways. On 
one hand, we may assume that 7Z D .F(M), which implies that all nonrejected 
null hypotheses are true. Therefore, the required familywise error control 
of condition (9) is no more than weak control. On the other hand, we may 
assume that 7Z C F{M), which implies that all rejected hypotheses are false. 
This latter aspect of the single-step condition is relevant in the case of logical 
relations or substantial overlap between null hypotheses, and makes it easy 
to exploit such relationships, for example, in the manner of Shaffer (1986). 
In the case of restricted combinations, the admissibility condition (5) can 
be used, which translates to 

ch(7Z) = max ch(<S), 

S£<5>:H<£S,llcS 

as a condition on critical values. 

Because of the exploitation of relationships between hypotheses, the re- 
quired control of condition (9) is very limited: it is even weaker than weak 
control. In this context, it is important to note that the "local test" that 
is implicit in the single-step condition (9), which rejects if Sh > ch(7Z) for 
any H G 7i \ 7Z, is not generally a valid local test of the intersection hy- 
pothesis f](7-L\7Z) in the sense of the closed testing procedure. As condi- 
tion (9) only needs to control the familywise error for those M G M for which 
T(M) n 7Z = 0, that is, only for M ^ |J 7Z, the test only needs to be a valid 
test for the more restricted hypothesis {C\(7~L \7l)} \ \J 7Z. The latter hypoth- 
esis is part of the partitioning of 7~L rather than of its closure (see Section 4) . 
As the single-step condition only needs to control the probability of falsely 
rejecting this more restricted hypothesis, it has potential for a gain in power 
over closed testing-based procedures. 

As with the monotonicity condition, the single-step condition must hold 
for every 7Z for which 7Z = J~(M) for some M G M, even if it can never 
appear as a member of an actual sequence 7Zo,7Z\,... of sets of rejected 
hypotheses. 

As a side note, it can be remarked that it is conventional, but not nec- 
essary, to use closed rejection sets in (6), rejecting when Sh > Ctf(7£). We 
may just as well define an analogous sequentially rejective multiple testing 
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procedure based on open rejection sets, defining 

(10) N{U) = {H £H\K:S H >k H (1l)} 

for some critical value function k = {k^Heu- This open-set-based proce- 
dure will be important in Section 5.2. 

3. Bonferroni-Shaffer-based methods. There is a large class of sequen- 
tially rejective multiple testing procedures that fulfil the single-step condi- 
tion through an inequality we call the Bonferroni-Shaffer inequality: the 
Bonferroni inequality combined with Shaffer's (1986) treatment of logically 
related hypotheses. In this section, we review examples of such methods and 
show that they all conform to the general sequentially rejective multiple 
testing procedure described in the previous section. 

All Bonferroni-Shaffer-based methods start from raw p- values {ph}h£H 
for each hypotheses, which have the property that for every H € T(M) and 
every a £ [0, 1], 

(11) PAi(pH<a)<a. 

We may define a sequentially rejective multiple testing procedure directly 
for the raw p-values. Analogous to choosing the critical value function c, 
choose some function a = {a^HeHi f° r which an : 2 W — > [0,1] for every 
H € H, and set 

(12) N{U) = {H eH\1l:pH <a H (K)}. 

It will be helpful to restate some of the inequalities of the previous section in 
terms of {ph}hgh an d ct( - )- It follows from Theorem 1 that the procedure 
based on (12) controls the familywise error if it fulfils the monotonicity 
condition that 

a H (n)<a H (S) 

for every 1Z C S C H and every H € % \<S, and the single-step condition that 
Pm( (J {PH < a H (n)}) < a 

for every IZcH and for every M 6 M for which 1Z = J-{M). 

The Bonferroni-Shaffer-based methods make use of the following inequal- 
ity in the single-step condition. If TZ C H and T{M) n 7Z = 0, we have 

(13) P M ( |J {p H < a H (K)} J < a H(K)< a H(K) 

and we can control the left-hand side by controlling either the right-hand 
side term (the classical Bonferroni inequality) or the middle term (Shaffer's 
improvement). The difference between the middle term and the right-hand 
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side term of (13) is important in the case of logical implications between 
null hypotheses. 

Many well-known multiple testing procedures make use of the inequal- 
ity (13) for their single-step condition. These methods have exact familywise 
error control if the p- values they are based on conform to (11) exactly and 
asymptotic control if the p- values conform to (11) asymptotically. We review 
a number of them briefly below. The methods described in Section 6 and 
even those in Section 4 can also be seen as Bonferroni-Shaffer-based. 

Holm's procedure is explicitly sequential, as the title of his paper (1979) 
clearly states. Let | • | indicate the cardinality of a set and suppose that \Ti\ 
is finite. The critical value function of Holm's procedure is given by 

,_. a 

aH{n)= ww\- 

The monotonicity condition holds because \H \ 1Z\ > \H \ S\ if 7£ C 5, and 
the single-step condition follows immediately from the Bonferroni inequal- 
ity (13). This construction trivially extends to the weighted version of Holm's 
procedure. 

In the case of logical relationships between procedures, we may obtain 
uniformly more powerful procedures by setting an {TV) = a/\H \Tt\ for all 
1Z € VP, as in Holm's procedure, and use the condition (5) to obtain improved 
critical values for 1Z ^ ^. We set 

an {TV) = min a#(5) 

for all IZtfi'fy, which results in the critical value function 

a 

aniri) = mm -—. t—tt . 

v ' MeH:T(M)mz=0\T{M)\ 

This is the so-called "P3" procedure of Hommel and Bernhard (1999). This 
procedure is a uniform improvement over the earlier "S2" procedure of Shaf- 
fer (1986), which has critical value function 

OL 

(14) a H (Ti) = min 



M:T{M)r\Tl=0 \T{M)\ 

Shaffer's procedure may be obtained by taking 

. a 

a_f/(/v) = nun 



seines \n\K\ 

for all 1Z £ fy, using a weaker version of condition (5). The monotonicity 
and single-step conditions for the "S2" and "P3" procedures may also be 
checked directly from Theorem 1. Monotonicity is trivial and single-step 
control follows immediately from the Bonferroni-Shaffer inequality (13). 

4. Closed testing and partitioning. The general closed testing [Marcus, 
Peritz and Gabriel (1976)] and partitioning procedures [Finner and Strass- 
burger (2002)] are fundamental principles of multiple testing in their own 
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right. Still, as we shall show in this section, even in their most general for- 
mulation, both principles can be derived as special cases of the sequentially 
rejective procedure and the Bonferroni-Shaffer inequality, provided that the 
collection of hypotheses H is extended to include the closure or the parti- 
tioning of these hypotheses, respectively. 

Even though we can view closed testing and partitioning as special cases 
of sequential Bonferroni-Shaffer methods in this way, the procedures are dif- 
ferent from the Bonferroni-Shaffer-based procedures described earlier. They 
ensure that, before a false rejection has been made, there is never more 
than one true null hypothesis H £ T(M) that has auiTZ) > 0. Therefore, 
they control the sum in (13) through the number of terms, rather than 
through their magnitude. This makes closed testing and partitioning less 
conservative than some other methods, which is illustrated by the fact that, 
unlike most Bonferroni-Shaffer-based procedures, these methods never give 
multiplicity- adjusted p- values (see Section 7) that are exactly 1 unless there 
are raw p- values which are exactly 1. 

It is interesting to note that the relationship described in this section 
between closed testing and partitioning on the one hand, and sequential 
methods on the other, is reversed relative to the traditional one. It has often 
been observed that sequential methods such as Holm's can be derived as 
special cases of closed testing or partitioning. Here, we show, conversely, 
that closed testing and partitioning procedures, in their most general forms, 
can be derived as special cases of sequential rejection methods. 

4.1. Closed testing. The closed testing procedure was formulated by Mar- 
cus, Peritz and Gabriel (1976). It requires that the set H of null hypotheses 
be closed with respect to intersection, that is, for every H EH and J € H, 
we must have H D J €.H, unless H fl J = 0. If necessary, the set H may be 
recursively extended to include all nonempty intersection hypotheses. Define 
i(H) = { J € % : J C H} as the set of all implying null hypotheses of H . 

We consider the most general form of the closed testing procedure here, 
placing no restrictions on the local test statistic Sh used to obtain the 
marginal p- values pn for each intersection hypothesis H G H. The closed 
testing procedure is sequential. It starts by testing all hypotheses which have 
no implying null hypotheses in T~L (typically, this is only f] H, the intersection 
of all null hypotheses). If at least one of these hypotheses is rejected, then 
the procedure continues to test all null hypotheses for which all implying 
null hypotheses have been rejected, until no more rejection occurs. All tests 
are done at level a. In terms of the general sequentially rejective procedure, 
the critical value function is given by 





otherwise. 
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The closed testing procedure conforms to the conditions of Theorem 1. 
The monotonicity is immediate from the definition of the critical value func- 
tion. The single-step condition follows from the Shaffer inequality (13) in the 
following way Assume that 7Z n T(M) = 0. Consider T = f] T(M), the in- 
tersection of all true null hypotheses. As M € T, T is not empty and, by 
the closure assumption, T £ 7~L and even T £ T(M), so T ^ 71. For every 
H G T(M) for which H ^ T, we have T € i(H) and therefore i(H) % 71. 
Hence, 

^2 a H (7Z) < a T (K) < a, 
HeT(M) 

which proves the single-step condition. 

The practical value of this construction is algorithmic. The sequentially 
rejective view of closed testing emphasizes that it is not usually required 
to calculate all intersection hypotheses tests, but only those for which all 
implying hypothesis have been rejected in previous steps. At the cost of 
some bookkeeping, this may greatly reduce the number of tests which must 
be performed. 

4.2. Partitioning. The closed testing principle of Marcus, Peritz and 
Gabriel (1976) has been a cornerstone of multiple hypotheses testing for 
decades. However, Stefansson, Kim and Hsu (1988) introduced what is now 
called the partitioning principle, and Finner and Strassburger (2002) showed 
that the partitioning principle gives a multiple testing procedure that is at 
least as powerful as closed testing and which may be more powerful in some 
situations. 

The main idea is to partition the union of the hypotheses of interest into 
disjoint sub-hypotheses such that each hypothesis can be represented as the 
disjoint union of some of them. We refer to the collection of these sub- 
hypotheses as the partitioning V and include it in H. Formally, we assume 
that V C 7i, where V is such that for any J and K in V with J ^ K, we 
have J n K = 0, and for each H G H, H = (J K for some KQV. The set H 
may have to be extended by its partitioning to make PCH hold true. 

As in closed testing above, we put no restrictions on the test statistics 
used; however, the procedure only actually uses the marginal p- values pn for 
H EV. In terms of the general sequentially rejective procedure, the critical 
value function for the hypotheses in W \ 71 is given by 

r a, it Her, 

a H (7l) = <l, if He7-L\V and H C\JK, 
1 0, otherwise. 

As a sequentially rejective procedure, the partitioning procedure never re- 
quires more than two steps. In the first step, the procedure rejects only those 
hypotheses that are part of the partitioning and in the second, it rejects any 
hypotheses implied by the union of the rejected partitioning hypotheses. 
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To prove familywise error control through the sequential rejection princi- 
ple, we check the monotonicity and single-step conditions. Monotonicity is 
trivial. Let T(M) n TZ = 0. The single-step condition follows trivially from 
the Shaffer inequality (13) by writing 

(15) J2 a n{n)= Yl <xh(R-)+ Yl aH ^- 

HgT(M) HeT(M)r\V H£T(M)\V 

We have \T{M) n V\ < 1 because the hypotheses in V are disjoint, and 
a H {1Z) = for every H € T(M) \ V since H € T(M) with T(M) n TZ = 
implies H <£. \JTZ. The right-hand side of (15) is therefore bounded by a. 

As for the relationship between sequential rejection and partitioning, it 
is interesting to remark that it is possible to construct an alternative proof 
of Theorem 1 that constructs the sequential rejection principle as a par- 
titioning procedure with shortcuts [see Calian, Li and Hsu (2008) for the 
definition of shortcuts in the partitioning procedure]. Combined with the 
result of this subsection, this suggests an interesting duality between se- 
quential rejection and partitioning: sequential rejection is partitioning with 
shortcuts, while partitioning is sequential rejection based on an augmented 
collection of hypotheses. The same alternative construction of sequential re- 
jection based on shortcuts also makes it easier to compare the sequential 
rejection principle with earlier treatments of sequential testing, such as that 
of Hommel, Bretz and Maurer (2007), which are constructed using shortcuts 
in the closed testing procedure. In contrast to these methods, the sequential 
rejection procedure can exploit some of the additional power of partitioning 
relative to closed testing [Finner and Strassburger (2002)], especially in the 
case of logical relationships or overlap between hypotheses. 

A simple example may serve to illustrate the relationships between parti- 
tioning, closed testing and sequential rejection. Let A > and let Hi : 9 < A, 
H2 : 9 > —A and H12 = H\ n H2 be the three hypotheses of interest. Closed 
testing would start by testing H12 at level a and proceed to test H\ and H2 , 
both at a, once H12 is rejected. Sequential rejection may similarly start test- 
ing H\2 at level a. After rejection of H\2 at the second step, however, it may 
assume, for all subsequent tests, that H12 is truly false. As a consequence, 
it may simultaneously test H\ using a test for H[ : 9 < — A and H2 using a 
test for H' 2 : 9 > A, performing both tests at level a because H[ and H' 2 are 
disjoint. The latter tests may be more powerful than the original tests for 
Hi and H2. Note that the partitioning procedure would start immediately 
by constructing H[ and H 2 , and would come to exactly the same qualitative 
conclusion regarding the hypotheses of interest as the sequential rejection 
principle. 

5. Non-Bonferroni-based methods. The Bonferroni-Shaffer inequality al- 
lows control of the familywise error with only assumptions on the marginal 
distribution of each test statistic and no additional assumptions on their 
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joint distribution. Implicitly, the methods based on that inequality (except 
closed testing and partitioning) assume the worst possible joint distribution 
for familywise error control, which is the distribution for which all rejection 
regions are disjoint. If the joint distribution is more favorable, Bonferroni- 
Shaffer-based methods may be conservative, controlling the familywise error 
at a level lower than the nominal a level. Improved results may be obtained 
for distributions that are more favorable than the worst case of Bonferroni- 
Shaffer, but only at the cost of additional assumptions. 

The sequential rejection principle is not limited to methods based on the 
Bonferroni-Shaffer inequality (13), but may also be used in combination 
with other methods to ensure the single-step familywise error condition. 

5.1. Sidak's inequality. For example, we may be willing to assume that 
Sidak's (1967) inequality, 




holds for every M 6 M and for all constants {sh}h£H, as it does for test 
statistics independent under the null. In that case, it is possible to define a 
sequentially rejective procedure based on the critical value function 

a H (K) = l-(l-a) 1 /\ H W 

for the raw p-values {ph}hgh based on the test statistics {Sh}hgh- This is 
the step-down Sidak procedure [Holland and DiPinzio Copenhaver (1987)]. 
Its familywise error control can be proven from Theorem 1 using Sidak's 
inequality. 

5.2. Resampling-based multiple testing. A completely different approach 
to avoiding the conservativeness associated with the Bonferroni-Shaffer in- 
equality is to use resampling techniques to let the multiple testing procedure 
estimate or accommodate the actual dependence structure between the test 
statistics. 

Well-known resampling-based multiple testing procedures use the fact 
that the single-step and monotonicity conditions can both be kept by taking 
ch (11) as the maximum over M € M of the 1 — a quantile of the distribu- 
tion of m.ax.u&T{M)^H [Romano and Wolf (2005)]. This quantile is usually 
unknown, but it may be estimated by resampling methods, provided we are 
willing to make additional assumptions. Westfall and Young (1993) assume 
subset pivotality, which asserts that for every M S M, there is some TV € f]~H 
such that the distribution of max^ g 7-( ^ Sh is identical under and P n . 
Under this assumption, resampling of {Sh}h£H\ii under the complete null 
hypothesis, using permutations or the bootstrap, can give consistent esti- 
mates of the desired quantiles. The subset pivotality condition has been 
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the subject of some discussion [Dudoit, Van der Laan and Pollard (2004), 
Westfall and Troendle (2008)] and several authors have given alternative 
assumptions that allow estimation of the quantiles of interest [Romano and 
Wolf (2005), Dudoit and Van der Laan (2008)]. Whatever the underlying 
assumptions, consistent estimation of the quantiles of max H ^t(m) Sh only- 
guarantees control of the familywise error in an asymptotic sense. Asymp- 
totic control of the familywise error is beyond the scope of this article. 

We focus instead on resampling-based methods with exact familywise er- 
ror control, putting these into the framework of the sequential rejection 
principle. Following Romano and Wolf (2005), we may obtain exact con- 
trol by generalizing the treatment of permutation testing in Lehmann and 
Romano [(2005), Chapter 15] to a multiple testing procedure. This method 
does not explicitly strive to estimate the quantiles of the distribution of 
max HeT(A/) Sh- 

To define a resampling-based sequentially rejective multiple testing proce- 
dure with exact familywise error control, we choose a set tv = {tvi, . . . ,7r r } of 
r functions that we shall refer to as "null-invariant transformations," or null- 
invariants, each of which is a bijection from the outcome space f2 onto itself. 
For everything to be well defined, we must assume that the null-invariants 
map every measurable set onto a measurable set, but we will not concern 
ourselves with such technicalities here. Using the null-invariants, we can de- 
fine transformed test statistics Sh ° TTj for every H £ % and i G {1, . . . , r}. 
The name null-invariants for the transformations tv comes from assumption 
(17) below, which says that transformation of Sh by 7Tj does not change the 
distribution of Sh if H is a true null hypothesis. 

For the sake of concreteness, we give some motivating examples of null- 
invariant transformations which fulfill the universal null-invariance condition 
(17). Let = denote equality in distribution. As a first example, in a one- 
sample situation, suppose that for n i.i.d. subjects, we have sampled a p- 
dimensional vector X = {Xi}? =1 , symmetrically distributed around a vector 
= {0i}f =1) that is, 

X-0 = 0-X. 

If we want to test iifj : #j = for i = 1, . . . ,p with Student T- or Wilcoxon 
statistics, then all 2 n transformations that map the measured X to —X for 
a subset of the n subjects are null-invariant transformations. Secondly, in a 
two-sample situation, suppose that we have an i.i.d. sample of size n from a 
p-dimensional vector X = {Xi} p i=l and an independent i.i.d. sample of size 
m from a p-dimensional vector Y = {¥$ ftLi, and that 

X = Y + 6 

for some vector 6 = {#i}f =1 - If we want to test Hi:9i = for i = 1, . . . ,p 
with Student T- or Mann- Whitney statistics, the usual permutations of the 
group labels are all null- invariant transformations. 
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The sequentially rejective procedure based on 7r will be defined as follows. 
Let s = r — [ar] , where [ar] is the largest integer that is at most equal to 
ar. For any test statistic S, define the random variable (S o 7r)( s ) as the 
sth smallest value among S o 7r = {S o 7Tj}£ =1 . It is convenient to define the 
sequentially rejective multiple testing procedure based on the null-invariants 
7r using the open rejection set variant (10) of the general procedure. The 
critical value function is given by 

(16) kni'R-) = ( max Sjotv) 

V 1 V ' \Jen\K J (a) 

Note that, in contrast with all procedures described above, the critical values 
kf{{TVj are random variables. The notation for the critical values in (16) is 
suggestive of the algorithm for permutation-based multiple testing of Ge, 
Dudoit and Speed (2003). 

The familywise error control of the procedure based on (16) can be proven 
with the open rejection set version of the sequential rejection principle of 
Theorem 1, although not without additional assumptions. The monotonicity 
of the critical values for every outcome oj 6 f2 is immediate from the defini- 
tion. To prove the single-step condition, we use Theorem 2, adapted from 
Theorem 15.2.1 of Lehmann and Romano (2005), which considers testing of 
a single hypothesis. The proof of the theorem is given in Appendix B. 

Theorem 2. Suppose that the transformations {tti, . . . ,n r } form a group 
in the algebraic sense. Also, suppose that for every MsM and for every 

ie{l,...,r}, 

(17) {S H ° n}HeT(AI) = {$H ° 7T ° 7Ti}ffeT(M)> 

where = denotes equality in (joint) distribution. Then, for every M € M, 

(18) P M ( (J {S H >k H (H\T(M))} \ <a. 

The condition of Theorem 2 that the transformations 7r form an algebraic 
group is not very stringent. The typical null-invariant transformations, such 
as permutations, that are frequently used in hypothesis testing usually meet 
this requirement. Instead of the complete group {iri, . . . , ir r }, we may also 
take a random sample (with or without replacement) from the group. It is 
easy to verify in the proof of Theorem 2 that, in that case, the result (18) 
of the theorem holds in expectation over the sampling distribution. 

The other condition, (17), which we call the universal null-invariance con- 
dition, is more crucial. This condition requires that the joint distribution of 
the test statistics of the true null hypotheses and their transformations by 
7r is not altered by another application of a transformation in 7r. This moti- 
vates the naming of the transformations as "null-invariants." The condition 
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is a generalization of the randomization hypothesis for single hypothesis tests 
[Lehmann and Romano (2005), page 633], which says that under the null 
hypothesis, the distribution of the data is not affected by the transforma- 
tions in 7T. In many situations, a practical way to check the condition (17) 
is to check the randomization hypothesis for the subset of the data that is 
used for the calculation of {Sh}hgT{M)- 

The crucial "universal" part of the universal null-invariance condition is 
that the set of null-invariants {tti, . . . ,7i>} is not allowed to depend on H, 
M or 1Z: the same set of transformations must be null-invariant for the joint 
distribution of all true null hypotheses in every model M and for every step 
of the procedure. 

5.3. Step-up methods. Sequential rejection is immediately associated with 
step-down methods, and several of the methods we have so far considered 
(Holm, Sidak, resampling-based multiple testing) are of the traditional step- 
down category. However, the sequential rejection principle is not limited to 
applications within this class of methods, but may also be used to good 
effect in combination with methods in the step-up category. Step-up meth- 
ods are usually presented as methods that sequentially accept hypotheses, 
rather than sequentially rejecting them. We present an alternative, sequen- 
tially rejective view of step-up methods, as follows. Suppose that for every 
1Z C H, we choose a sequence of ordered critical values 



(20) N(K) = \J{1C CH\K: PH < a lH \ {nulc)l+1 (TZ) for every H G K}. 



This function says that after having rejected a collection of hypotheses 1Z, 
we proceed in the next step of the procedure to reject all hypotheses vaT-L\TZ 
with p- value smaller than a^lZ) whenever there are at least \H\1Z\ — k + l of 
those, and it does this for k = 1, . . . ,\H\TZ\ simultaneously. Equivalently, in 
terms of ordered p- values, for k = 1, . . . , \H \TZ\ , it rejects the hypothesis with 
the A:th largest p- value if that p- value is smaller than ak(7Z) and rejects every 
null hypothesis with a p- value smaller than that of any rejected hypothesis. 

By Theorem 1, the sequential rejection procedure based on (20) controls 
the familywise error if it fulfills the monotonicity and single-step conditions. 
We have summarized this result in the following corollary. 

Corollary 1. //, whenever 1ZCS, for i = 1, . . . , |7{ \«S|, we have 



(22) P A/ [J {p H <a\ nM )\K\+i(HM)) for every He K} < a, 





7CCT(M) 
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then the sequentially rejective procedure based on (20) satisfies 

P«(^Cj(M))>l-a. 

Procedures based on Corollary 1 can be seen as having an inner and an 
outer loop. The inner loop performs familiar step-up testing; the outer loop 
recalibrates the critical values of the step-up procedure based on rejections 
in the inner loop. 

Proof of Corollary 1. We prove the corollary by checking the con- 
ditions of Theorem 1. The single step condition (3) is immediate from (22). 
We proceed to check monotonicity (2). Choose some KC5cK. Mono- 
tonicity holds trivially if N(1Z) = 0, so we may suppose that J\f(lZ) ^ 0. 
Let H € N{1Z): we have to show that H belongs either to Af(S) or to 
S. By definition of M(TZ), there is some K, C % \ 1Z such that H € K. and 
PJ < <x\-H\(nuK)\+i(fy for ever y J €/C. By (19) and (21), we have 

a \n\CR.UK)\+l{^-) < a \H\(TlUK.)\+l(S) < a \H\{SUK.)\+l(S)- 
We have either H € S or H $ S. When H <£S, we have H € K \ S = it. 
Then K<^U\S is such that pj < a|-H\( 5 uK:)|+i( 5 ) = a \H\(suK)\+i( S ) for 
every J G K, thus H € A/"(5) . □ 

The simplest nonsequential application of Corollary 1 is the method of 
Hochberg (1988). This method assumes that the inequality of Simes (1986) 
holds for the collection of true null hypotheses T(M) so that 

p«( u (n 

This inequality holds for independent test, but also under some types of de- 
pendence [Sarkar (1998)]. In Hochberg's method, cti(lZ) = a/i for every 1Z. 
Monotonicity is straightforward for this method, and the single-step condi- 
tion (22) follows immediately from Simes' inequality because 

a a • \JC\ 

\T(M)\K\ + 1 ~ \T(M)\ 

if JCCT(M). 

The value of the embedding of Hochberg's method into the sequential 
rejection framework is most obvious when we consider logically related hy- 
potheses. Hommel (1988) already remarked that if it is known a priori that 
|7~(M)| < k < {Hi, then the critical values of Hochberg's method can be re- 
laxed to OLiiJl) = a j min(i, k). This can be easily seen from the condition (22) 
by realizing that this condition does not involve cti(lZ) for i > |T(M)|, so 
this value can be chosen freely. Such a relaxation of the critical values is 
particularly useful if the step-up procedure is embedded in a further sequen- 
tially rejective procedure, for example, in the case of three hypotheses, one 
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that first tests a global null hypothesis H\ n H2 H H3 at level a before testing 
Hi, H2 and H3 in a step-up fashion. By the sequential rejection principle, 
such a test procedure may proceed at the second stage, assuming that the 
global null hypothesis is false. 

Truly sequential results may be obtained in other situations with re- 
stricted combinations [Hochberg and Rom (1995)] if we let the critical values 
of the step-up procedure depend on the set of previous rejections. We can 
define a step- up analogy to Shaffer's S2 method (14), defining 



Strong control for this method follows from Corollary 1. Monotonicity for 
this method is trivial and the single-step condition still follows immediately 
from Simes' inequality. 

We give two simple examples with restricted combinations in which this 
method is more powerful than the regular method of Hochberg. First, con- 
sider the case of testing all pairwise comparisons and take the situation 
with three hypotheses H12 '-^1=^2, H23 '^2 = ^3 and H13 '^1 = ^3 as an 
example. In this case, |T(M)| can only take the values 0, 1 or 3. The test 
statistics may conform to Simes inequality, for example, if the data for each 
null hypothesis come from independent studies. Hochberg's procedure would 
reject if all three hypotheses have p- values at most a, if any two hypotheses 
have p- values at most a/2 or if any single hypothesis has p- value at most 
a/3. The sequentially rejective step-up procedure defined in (23) would, if 
Hochberg's procedure would have made only a single rejection, additionally 
reject one of the remaining hypotheses if either of them had a p- value of at 
most a. Second, consider testing the three hypotheses Hi : fj,\ < 0, H% : ^2 < 
and H$ : + [j,2 < 0. If the respective test statistics T\ and T2 for Hi and 
H2 are independent and normally distributed, and we use T3 = T\ + T2 as 
test statistic for H3, then the test statistics conform to the conditions of 
Sarkar (1998) so that Simes' inequality may be used. Note that falsehood of 
H3 implies falsehood of at least one of Hi and H2. Therefore, if the p- value 
of H3 would be below a/3 and the p- value of one of Hi or H2 would be 
between a/2 and a, but the p-value of the other would be above a, then 
the sequential method based on (23) would reject two hypotheses, whereas 
Hochberg's procedure would reject only one. 

6. Gatekeeping and graph-based testing. Multiple testing methods may 
also be used in a situation in which hypotheses are not exchangeable, but 
where interest in one hypothesis is conditional on the rejection of other 
hypotheses. This is an area of extensive recent interest, both in clinical tri- 
als and in genomics research. In this section, we review gatekeeping and 
graph-based testing procedures, and demonstrate how the sequential rejec- 
tion principle may be applied to uniformly improve upon existing methods. 



(23) 



a 



min(i,max{|T(M)| :T(M) DTZ = 0}) ' 
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6.1. Gatekeeping. Gatekeeping strategies [see Dmitrienko and Tamhane 
(2007) for an overview] are popular in clinical trials, in which often multi- 
ple primary, secondary and possibly tertiary endpoints are considered. In a 
gatekeeping strategy, the null hypotheses in H are divided into k families, 
Gi, . . . ,Qk, each Qi C H. Hypotheses in a family Gi+i are not tested before 
at least one hypothesis in the family Qi has been rejected. 

Gatekeeping strategies are sequential in a very natural way [Dmitrienko 
et al. (2006)] and they are easily fitted into the framework of the general 
sequentially rejective procedure. We illustrate this for the basic unweighted 
serial [Westfall and Krishen (2001)] and parallel [Dmitrienko, Offen and 
Westfall (2003)] gatekeeping strategies for two families, Q\ and g 2 . 

The standard serial gatekeeping procedure uses Holm in the first family 
and Holm in the second family, testing the second family only after the first 
has been completely rejected. It can be defined as a sequentially rejective 
procedure with the critical value function 

(a/\Gi\K\, iffl-e&, 
a H (1l) = la/\G 2 \n\, ifHeg 2 axxdg 1 CK, 
1 0, otherwise. 

Both the monotonicity and single-step conditions are trivially checked for 
this procedure. 

The usual parallel gatekeeping procedure for two families Q\ and g 2 uses 
Bonferroni in the first family and Holm in the second. It starts testing the 
second family whenever at least one hypothesis in the first family has been 
rejected, but tests the second family at a reduced level if not all hypotheses 
in Qi have been rejected. This procedure can be defined with the critical 
value function 

Monotonicity is again trivial. The single-step condition follows from the 
Bonferroni inequality (13), writing 

a H {Tl)= > — + > - ' —. "\ <a. 

^ \Gi\ ^ \g2\H\ • & ~ 

It is clear from this equation that there is the potential for a gain in power for 
the procedure in the situation where g 2 C 1Z and £/i % 1Z because, in that 
case, the inequality on the right-hand side is a strict inequality. We may 
set am (7Z) = <V|(?i \ TZ\ for H G £/i if g 2 C 1Z without losing the single-step 
condition. This has also been noted by Guilbaud (2007). 

Versions of the gatekeeping procedure for more than two families, as well 
as weighted versions, are easily formulated within the sequential rejection 



an (ft) 
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framework and the conditions of Theorem 1 are easy to check. The same 
holds for the many recent extensions and variants of gatekeeping [Dmitrienko 
et al. (2007), Edwards and Madsen (2007), Guilbaud (2007), Dmitrienko, 
Tamhane and Wiens (2008)]. Earlier generalizations of the class of gate- 
keeping procedures, such as that of Hommel, Bretz and Maurer (2007), did 
not include the case of logically related hypotheses, such as are present, for 
example, in the procedure of Edwards and Madsen (2007). 

6.2. Graph-based procedures. Our main motivation for the development 
of the sequential rejection principle has been our interest in the develop- 
ment of multiple testing procedures for graph-structured hypotheses. Mul- 
tiple testing in graphs is a subject of great interest, both for applications in 
clinical trials and in genomics. Specific procedures for controlling the fami- 
lywise error for graph-structured hypotheses have been proposed by several 
authors. Examples include the fallback procedure [Wiens and Dmitrienko 
(2005)], which redistributes the alpha allocated to rejected hypotheses to 
neighboring hypotheses, the method of Meinshausen (2008), which sequen- 
tially tests hypotheses ordered in a hierarchical clustering graph, the focus 
level method [Goeman and Mansmann (2008)], which combines Holm's pro- 
cedure with closed testing for hypotheses in a partially closed directed acyclic 
graph, and the method of Rosenbaum (2008), which sequentially tests or- 
dered hypotheses. All of these methods can be formulated as special cases 
of the sequentially rejective multiple testing procedure (12) that control the 
familywise error with Theorem 1 and the Bonferroni-Shaffer inequality (13). 

Several authors [Dmitrienko et al. (2007), Hommel, Bretz and Maurer 
(2007), Bretz et al. (2009), Burman, Sonesson and Guilbaud (2009)] have 
proposed general methods for recycling the alpha in graph-structured hy- 
pothesis testing, using very general graph structures. These methods can 
be seen as special cases of the sequential rejection principle, all basing their 
single-step condition on the weaker right-hand side inequality of (13). In par- 
ticular, we mention the powerful graphical approaches of Bretz et al. (2009) 
and Burman, Sonesson and Guilbaud (2009), which are very easy to inter- 
pret and communicate, and are flexible enough to cover diverse methods 
such as gatekeeping, fixed sequence and fallback procedures. The authors 
of these papers structure the tests in gatekeeping procedures in a directed 
graph with weighted edges. An initial distribution of alpha is chosen and, 
once a hypothesis is rejected, the alpha allocated to the rejected hypoth- 
esis is redistributed according to the graph. The graphical visualization of 
the testing procedure increases the understanding of how a testing strategy 
works and is a useful tool for developing, as well as communicating, pro- 
cedures. However, these methods cannot make use of logical relationships 
between hypotheses and, as such, do not incorporate graph-based meth- 
ods which exploit such relationships, such as those of Edwards and Madsen 
(2007), Goeman and Mansmann (2008) and Meinshausen (2008). 
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6.3. Testing in trees. To illustrate the ease with which multiple testing 
procedures in graphs can be formulated and improved, we consider the case 
of the tree-based method of Meinshausen (2008). Every node in the tree cor- 
responds to a null hypothesis to be tested. We assume that logical relation- 
ships exist between the hypotheses in the tree, in the sense that each parent 
hypothesis is the intersection of its child hypotheses: if children(if) ^ 0, we 
have 



Tree-structured hypotheses of this type may arise if a general research ques- 
tion is repeatedly split up into more specific sub-questions. 

Meinshausen (2008) proposed a simple test procedure for tree structures 
and a more advanced one which exploits the logical relationships between the 
hypotheses in the manner of Shaffer (1986). We shall discuss both methods 
in turn and show how they can be improved using the sequential rejection 
principle. 

The simple method would start testing the hypothesis at the top of the 
tree of Figure 1 at level a and, after rejection of that hypothesis, would 
continue testing both child nodes at level a/2. If one of these child nodes gets 
rejected, its child nodes are then tested at level a/4. The procedure continues 
until no further rejection is achieved. For general trees, this procedure is 
easily represented in the sequential rejection framework by the critical value 
function 



where Lh is the number of descendant leaves of a hypothesis H and L=\C\ 
is the total number of leaves C in the graph. Call a hypothesis H "active" 
if it has an {TV} > and is not rejected. 



H = p| children^). 




if ancestors (H) C TZ, 
otherwise, 



Fig. 1. A symmetric binary tree of four levels. 
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Control of the familywise error can easily be checked by the sequential 
rejection principle. Monotonicity of critical values is immediate from the 
definition. To check the single-step condition, note that we only need to 
consider control for those rejected sets 1Z which are equal to J-(M) for 
some M € M. Due to the logical relationships between the hypotheses, every 
J~(M) is a subtree and the active hypotheses are the children of the leaves 
of this subtree. The set of active leaves of the original tree and the sets of 
descendant leaves below each active hypothesis are, therefore, all disjoint 
and the union of these sets contains exactly the L'{1Z) = \C\1Z\ unrejected 
leaves, so 



a-L'CR) 
}2 <*h(K)< -^<a. 



L 

HeH\K 

This proves the single-step condition for Meinshausen's basic procedure. 

From the inequality above, we can immediately see that we can set uni- 
formly sharper critical values without loss of the single-step condition by 
setting 

(24) M^) = {lW' ^ anCeStOTS (H) ~ U: 

1 0, otherwise, 

using the number L'(7Z) of unrejected leaves, rather than the number of 
original leaves, in the denominator. This improvement is analogous to the 
improvement from the procedure of Bonferroni to the procedure of Holm. 

The two procedures outlined above do not yet make effective use of the 
logical relationships between the null hypotheses in the graph. One way, 
proposed by Meinshausen (2008), to make use of those, is to use what he 
calls the Shaffer improvement. To keep notation simple for this improvement, 
consider only the case of a symmetric binary tree, which is a tree with a 
single root, in which every node has zero or two child nodes, and in which 
the subtrees formed by the descendants of child nodes of the same parent 
are identical (see Figure 1). For such a tree, Meinshausen proposed to use 

' a ■ Lh 



u h (K) 



L ' 
la ■ Lh 

L ' 

, 0, otherwise. 



if H ^ C and ancestors (H) C 1Z, 
if H € C and ancestors (if) C TZ, 



The critical value function is identical to the first critical value function for 
all hypotheses that are not leaves, but multiplies all critical values of leaf 
node hypotheses by 2. 

Control of the familywise error for this hypothesis follows from the se- 
quential rejection principle in much the same way as above. To see why the 
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factor 2 can be applied, note that when checking the single-step condition, 
we may assume that all rejections in 1Z are correct rejections. In particular, 
once we have rejected a parent of a leaf node, because that hypothesis is the 
intersection of its two child hypotheses, we may assume that at least one of 
its children is false. Therefore, in the single-step condition, when calculating 
a bound for Y1h€T(M) a #(^)i om y one ou t °f each pair of leaf nodes with 
common parent contributes to the sum. 

It is convenient to rewrite the critical value function in terms of these 
pairs. Let Pjj be the number of leaf node pairs that either include H or are 
descendants of H so that Pjj = Lh/2 if H is not a leaf and Pjj = Lh = 1 
if H is a leaf. Let P = L/2 be the total number of leaf node pairs. We can 
then write 

a H (1Z) = I a p H ' if ancestors (^) Q 
[ 0, otherwise. 

Consider the set of true null hypotheses that are active. Note that, by the 
same reasoning as above, each leaf node pair has at most one member or 
ancestor in that set and leaf node pairs which have been completely rejected 
by the procedure have no member or ancestor in that set. Therefore, 



£ a H {K)< 7^<a, 



P 

H&T(M) 

where P'(7Z) is the number of leaf node pairs that have not yet been com- 
pletely rejected. This proves the single-step condition for Meinshausen's 
method with Shaffer's adjustment. 

Again, we see that it is possible to set uniformly sharper critical values 
without loss of the single-step condition, setting 

(25) a H {TZ) = { ( ^Wy if ancestors^) C ft, 

1 0, otherwise, 

which provides a uniform improvement. 

A second way to make use of logical relationships in Meinshausen's pro- 
cedure is to remark that the procedure (24) is inadmissible according to the 
criterion (5) and may be improved on the basis of that criterion. This im- 
provement is for general trees. We note that the single-step condition only 
needs to be shown for sets 1Z € $ and that 1Z € <5 implies that for every 
H £ 1Z, there is always at least one leaf K € offspring(iJ) for which K G 7Z. 
Therefore, define 



V = {H £ 1Z : offspring^) n 1Z = 0}, 
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the leaf nodes of the rejected subgraph, and define L"(7Z) = L — [D\. Noting 
that L"(1Z) = L'(1Z) if K G $ and that L"(1Z) < L'(S) for every Kc5e$ 
if 1Z ^ $, we see that (24) can be changed to 



without losing familywise error control. This is a uniform improvement 
over (24) because L"{TZ) > L'(K) for all 1Z £ It is easy to check that (26) 
conforms to the condition (5). 

It is interesting to note that the two improvements (25) and (26) of Mein- 
shausen's method do not dominate each other. This suggests that many 
extensions, variants and alternative improvements are possible, but these 
are beyond the scope of this paper. 

We also remark that the variant of Meinshausen's procedure without Shaf- 
fer's improvement is a special case of the methods of Burman, Sonesson and 
Guilbaud (2009) and Bretz et al. (2009). The improvement (24) might have 
been obtained in an easy way using the approaches of these authors and is 
also valid in the absence of logical relationships between hypotheses. The 
methods (25) and (26) that exploit logical relationships, however, are not 
contained in the frameworks of Bretz et al. (2009) and Burman, Sonesson 
and Guilbaud (2009), and require the use of the sequential rejection princi- 
ple. 

7. Multiplicity-adjusted p-values. Often in multiple testing situations, 
interest is not just in rejection and nonrejection of hypotheses at a pre- 
specified level a, but also in reporting multiplicity-adjusted p-values. Such 
multiplicity- adjusted p- values are defined for each null hypothesis as the 
smallest a-level that allows rejection of that hypothesis. In the general se- 
quentially rejective procedure, they can easily be found using the following 
algorithm, described earlier by Goeman and Mansmann (2008) for the spe- 
cific case of the focus level procedure. 

Suppose the critical value function c depends on a parameter a in such 
a way that (1) the sequentially rejective procedure based on c a controls the 
familywise error at most at a, and that (2) for all H and 7Z, Cf/ iCri (7£) > 
Cff,a 2 (^-) if a i < a 2, that is, critical values are nonincreasing in a. Multiplicity- 
adjusted p- values can then be calculated in the following way. 

Initialize ao = and TZ^ = 0. Iterate for 2 = 1,2,... 

1. Set ai to the smallest a for which Su > CH,a{TZ^ 1 ) for any H £ 



2. Follow the sequentially rejective procedure with the critical value 
function c ai , starting from 1Z 1 Q = TV^\ to find 7^. 

3. Set the multiplicity-adjusted p- values of all H £ IZ 1 ^ \ IZ 1 ^ 1 to Oj. 



(26) 




if ancestors (H) C 7Z, 



otherwise 



u\w-\ 
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The procedure can be stopped when either TZ^ = T~i or when Oj > 1. If the 
latter happens, all H € 'H\TZ' l ^ 1 can be given multiplicity- adjusted p- value 1. 

In step 2 of this algorithm, the sequentially rejective procedure for the 
next higher value of a starts from the final rejected set of the previous value 
of a. This is what makes the algorithm relatively efficient. It is interesting 
to note that this "warm start" is allowed as another consequence of the 
monotonicity condition (7): if that condition holds, then the sequentially 
rejective procedure that starts at 1Z l = TZ 1 ^ 1 converges to the same final 
rejected set as the sequentially rejective procedure that starts at TZq = 0. 

8. Discussion. The sequential rejection principle is a fundamental prop- 
erty of familywise error control which has been implicity exploited in many 
important methods. The sequential rejection principle links Holm's proce- 
dure to Bonferroni's. It presents both the closed testing procedure and the 
partitioning principle as consequences of Shaffer's procedure. It ties the tests 
in different families of a gatekeeping procedure together and it connects the 
step-down version of resampling-based multiple testing to the single-step 
version. The procedure is not limited in its application to step-down meth- 
ods, but can also effectively be used in the context of step-up methods, as 
we have demonstrated for Hochberg's method in the case of logically related 
null hypotheses. 

This paper has made the sequential rejection principle explicit. It shows 
how many well-known methods can be constructed as special cases of a 
general sequentially rejective multiple testing procedure, which is a mono- 
tone sequence of single-step procedures with a limited form of weak family- 
wise error control. This general procedure is interesting from a theoretical 
point of view, showing a close relatedness between seemingly different multi- 
ple testing procedures. The general procedure encompasses a great number 
of well-known sequentially rejective familywise error-controlling procedures 
and even some that have never been viewed as sequentially rejective before. 

The relationship between the sequential rejection principle and the parti- 
tioning principle deserves some attention. Even though we have shown that 
the partitioning principle may be derived as a special case of the sequen- 
tial rejection principle, we do not claim that sequential rejection is a more 
powerful or more fundamental principle than partitioning. Rather, the se- 
quential rejection principle presents an alternative perspective on multiple 
testing, which is flexible enough to include both closed testing and parti- 
tioning as special cases, but which does not always require construction of 
the full partitioning or closure of the hypotheses of interest. 

The most important aspect of the sequential rejection principle, however, 
is its practical usefulness. This ranges from simple applications, such as 
quickly answering the question whether any multiple testing correction is 
needed for simultaneous post hoc testing of H : fii = and J : fi2 = after 
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K : fj,i = jjL2 has been rejected, to the construction of multiple testing proce- 
dures for complicated graphs. Recently, there has been considerable interest 
in the latter application, both in the field of clinical trials with multiple 
endpoints and in the field of genomics. The sequential rejection principle 
can be a valuable tool in this area since the general sequentially rejective 
procedure lends itself very easily to graph-based testing, with conditions for 
strong control of the familywise error for the procedure that are intuitive 
and easy to check. The sequential rejection principle improves upon earlier 
proposals for general graph-based multiple testing procedures because it is 
capable of incorporating logical relationships between null hypotheses and 
because it is not restricted to Bonferroni-based control at each single step. 

APPENDIX A: RELAXING THE MONOTONICITY CONDITION? 

A COUNTEREXAMPLE 

In this section, we show that the relaxed version (8) of the monotonicity 
condition (7) is not sufficient for familywise error control. We do this by 
first constructing a sequentially rejective procedure that conforms to (9) and 
which controls the familywise error in each single step at level a, but which 
does not conform to (7). Next, we construct a data generating distribution 
for which this procedure has a familywise error greater than a. The example 
is highly artificial, but it serves as an interesting counterexample to the 
possibility of relaxation of the monotonicity criterion. 

The sequential procedure is of gatekeeping type, with four hypotheses: J, 
J', K and K'. The hypotheses J and K are primary, and the hypotheses J' 
and K' are secondary, being tested only after at least one of J and K has 
been rejected. Suppose that we have test statistics U j, Uji, Uk and Uk>, 
corresponding to the four hypotheses. Suppose, also, that the general model 
M says that, for H € {J,K,J' ,K'}, each Uh is marginally uniform U(0, 1) 
if H is true, and W(0,6f/) with bn < 1 if H is false. The test statistics are 
therefore very much like p-values and, as a consequence, we would reject 
each H for small values of Uh, as in the notation of Section 3. To construct 
the sequentially rejective procedure, choose some < a < 1/2 and some < 
e < a/2. The critical value function a(-) of the procedure is summarized in 
Table 1 for the rejection sets relevant to the first two steps of the procedure. 

The single-step condition of this procedure is easily checked, as the column 
sums of the table are bounded by a. It is also immediately clear that the 
procedure based on the critical value function of Table 1 does not satisfy 
the monotonicity condition (7) since 

(27) aj,{{J,K}) = \a<a-£ = aj l {{J}). 

However, the procedure does satisfy the relaxed monotonicity condition (8) 
since IZi = { J} can never be followed by 7^2 = {J, K}, so (27) is not relevant 
for that condition. 
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Table 1 

Critical value function «(•) of the sequentially 
rejective procedure 



Previously rejected hypotheses 
a {J} {K} U,K} 

a. j e e 

olk £ £ 

ctji a — £ a/2 

ax' a — £ a/2 



We now give an example of a distribution for which the procedure based 
on the critical value function of Table 1 does not control the familywise 
error. Suppose that, under the true model, Uj, Uj>, Uk and Uk 1 ah depend 
on a single uniform U(0, 1) variable U, in such a way that 

Uj = tU, 

U K = t(l-U), 

Uj, = U, 

U K , = l-U 

for some 2e < t < e/a. Note that J' and K' are true null hypotheses, whereas 
J and K are false. For this distribution, {U < a — e} implies rejection of J, 
but not K , in step 1, followed by rejection of J' in step 2, while, at the same 
time, {U > 1 — a + e} implies rejection of K, but not J, in step 1, followed 
by rejection of K' in step 2. The total probability of making a false rejection 
is therefore 

FWER > P({U < a - e} U {U > 1 - a + e}) = 2a - 2e > a 

and we conclude that the procedure does not control the familywise error. 

The procedure based on Table 1 can go wrong because the critical value 
function allows the first step of the sequentially rejective procedure to pre- 
select the null hypothesis that is most likely to give a false rejection in the 
second step. The monotonicity requirement (7) prevents this, but the relaxed 
monotonicity requirement (8) does not. 



APPENDIX B: PROOF OF THEOREM 2 

Choose any M £ M and let T = H \ T(M). Let the random variables 
S\,...,S r be defined as 

J 1, if max Sh ° ^ > ( max 5/ o 7r o 7Tj ) , 

Oi = < HeT(M) \JeT(M) ) (s) 

I 0, otherwise. 
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By condition (17), for all i, 
(28) E m (<5;) = Pm( (J {S H > k H (T)}\ 

where Em denotes expectation with respect to the measure Pm- 

Because {iri, . . . ,ir r } form a group in the algebraic sense, it follows that 
for every i, 

{iTi O TTl, . . . ,7Tj O 7T r } = {7Tl, . . . ,7T r }. 

Therefore, for every i, 

( max 5*j o 7r I = ( max Sj o 7Tj o 7r I 

VJeT(M) /(s) Vj G r(M) /(a) 

Consequently, 



/ = #i * : max 5w o 7T; > ( max 5 / o 7r ) > < r - s < ra 
-Hf I HeT{M) Vj e r(M) /(•)/ - 

for all u; G fi. Combining this with (28), we have 



Pm( (J {5 h >*:h(^)}) =r- 1 ^E M (,5 J ) = E M (r- 1 ^ ( 5 l j 

HeT{M) i=l i=l / 



<a. 
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